Detecting Webspam Beneficiaries Using Information Collected by the Random Surfer

نویسندگان

  • Thomas Largillier
  • Sylvain Peyronnet
چکیده

Search engines use several criteria to rank webpages and choose which pages to display when answering a request. Those criteria can be separated into two notion, relevance and popularity. This notion of popularity is calculated by the search engine and is related to links made to the webpage. Malicious webmasters want to artificially increase their popularity, the techniques they use are often referred to as Webspam. It can take many forms and is in constant evolution, but Webspam usually consists of building a specific dedicated structure of spam pages around a given target page. It is really important for a search engine to address the issue of Webspam otherwise it won’t be able to provide users with fair and reliable results. In this paper we propose a technique to identify webspam through the frequency language associated with random walks amongst those dedicated structures. We identify the language by calculating the frequency of appearance of k-grams on random walks launch from every node.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effects of Awareness of Fertilizer Subsidy on the Yield of Crops among Rural Farmers in Ghana

  The study examines the effects of awareness of fertilizer subsidy on the yield of crops among rural farmers in Ghana. Random sampling was used to select six communities and 10 households per community. They include Bawku, Navrongo, Tolon kumbungu and Walewale from the Northern part and Ejura and Atebubu in the Southern part of Ghana. Primary data were collected from the sampled household by ...

متن کامل

تغییرات مکانی آرسنیک در اراضی با کاربردهای مختلف در استان اصفهان

Industrial, agricultural and urban activities have contaminated soil by heavy metals that can also increase concentration of the metals in food chains. This study was carried out in Isfahan province where lots of such activities are in progress. The purpose of this study was to determine spatial variability of Arsenic )As) in Isfahan soils. In this research, the soil samples )0-20 cm) were coll...

متن کامل

Machine Learning Methods for Spamdexing Detection

In this paper, we present recent contributions for the battle against one of the main problems faced by search engines: the spamdexing or web spamming. They are malicious techniques used in web pages with the purpose of circumvent the search engines in order to achieve good visibility in search results. To better understand the problem and finding the best setup and methods to avoid such virtua...

متن کامل

MCBS Highlights: Dually Eligible Medicare Beneficiaries

Survey (MCBS) is a powerful tool for analyzing the Medicare population. Based on a stratified random sample, we can derive information about the health care use, expenditure, and financing of Medicare’s 37 million enrollees. We can also learn about those enrollees’ health status, living arrangements, and access to and satisfaction with care. The MCBS allows for detailed analysis of the dually e...

متن کامل

The Generalized Web Surfer

Different models have been proposed for improving the results of Web search by taking into account the link structure of the Web. The PageRank algorithm models the behavior of a random surfer alternating between random jumps to new pages and following out links with equal probability. We propose to improve on PageRank by using an intelligent surfer that combines link structure and content to de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJOCI

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2011